Protocol for preparing Drosophila genomic DNA to create chromosome-level de novo genome assemblies

Summary De novo genome assemblies are common tools for examining novel biological phenomena in non-model organisms. Here, we present a protocol for preparing Drosophila genomic DNA to create chromosome-level de novo genome assemblies. We describe steps for high-molecular-weight DNA preparation with phenol or Genomic-tips, quality control, long-read nanopore sequencing, short-read DNA library preparation, and sequencing. We then detail procedures of genome assembly, annotation, and assessment that can be used for downstream comparison and functional analysis. For complete details on the use and execution of this protocol, please refer to Sperling et al.1

The protocol below describes DNA preparation, genome analysis, and quality control measures used for fly genome preparation and comparison.However, it can easily be adapted for genomes of other species.It is important to decide which DNA preparation method might be more suitable for your purpose, given the results you are aiming for.For higher coverage of repetitive regions, the Genomic-tip preparation method works very well.However, a more contiguous genome can be assembled from DNA prepared with the phenol method.The phenol method has the potential risk of carry over that can interfere with certain types of sequencing, namely with Oxford Nanopore Technology (ONT/Nanopore) sequencing, we have added troubleshooting tips on how to avoid this.The Genomic-tip method does not have the phenol carryover risk.

High molecular weight DNA preparation
It is important to note that high molecular weight DNA is sheared rather easily, therefore care must be taken with handling.Low-bind plastics help prevent shearing of the DNA.It is also best to use wide-bore tips when pipetting solutions containing DNA.Although less ideal, in a pinch it is also possible to cut the ends off your tips to make them wide-bore.It is important not to vigorously shake the DNA containing tubes and never vortex it, unless when specifically indicated.other strain is sexually reproducing (wild-type) from an ancestral habitat (Brazil) with the DNA prepared using the Genomic-tip (QIAGEN) method.
High molecular weight DNA preparation with phenol or genomic-tips High molecular weight DNA isolation with phenol Timing: 3 days The isolation of high molecular weight DNA using the phenol extraction method.This method was a mixed protocol adapted from two other DNA preparation methods. 19,20 Dissociate Tissue.a. Homogenize sample.i. Add 150 mL of Sodium Dodecyl Sulfate (SDS) buffer to 1.5 mL tube containing 60 mg of sample (in our case 30-60 flies).ii.Break apart sample using a handheld electric pestle mixer for 10 s.
Note: This may need to be optimized for different sample types.
iii.Add 350 mL of SDS buffer to the homogenate.iv.Mix by inverting the tube gently 5 times.v. Incubate at 37 C for 4 h without agitation.

Note:
The incubation time can be decreased to 2 h or increased up to 16 h.We had the best preparation with a 4-h incubation.b.Eliminate RNA.
i. Add 5 mL of RNase A (100 mg/mL) to the 1.5 mL tube containing the homogenate.
ii. Mix by inverting the tube gently 5 times.
iii.Incubate at 50 C for 2 h without agitation.c.Digest Protein.
ii. Mix by inverting the tube gently 5 times.
a. First extraction.i. Add 240 mL of the phenol layer from phenol/chloroform/isoamyl alcohol (25:24:1) to the 1.5 mL tube containing the homogenate.Troubleshooting 1. ii.Mix gently for 3 min at 21 C (room temperature).
iii.Centrifuge at 12,000 3 g for 10 min.iv.Decant supernatant into a new tube.b.Second extraction.
i. Add 240 mL of the phenol layer from phenol/chloroform/isoamyl alcohol (25:24:1) to the 1.5 mL tube containing the supernatant.ii.Mix gently for 3 min at 21 C (room temperature).
iii.Centrifuge at 12,000 3 g for 10 min.iv.Decant supernatant into a new tube.c.Third extraction.
i. Add 240 mL of the chloroform layer from phenol/chloroform/isoamyl alcohol (25:24:1) to the 1.5 mL tube containing the supernatant.ii.Mix gently for 3 min at 21 C (room temperature).
iii.Centrifuge at 12,000 3 g for 10 min.iv.Decant supernatant into a new tube.d.Precipitate and isolate DNA.
i. Add 500 mL of À20 C absolute ethanol.
ii. Precipitate at À20 C for at least 5 min.
Note: This is a potential pause point, where DNA can be stored for 16 h (overnight) at À20 C.
iii.Remove DNA by spooling with a bent pipette tip.

Note:
The DNA is visible (Figure 1).iv. Place DNA into a new tube with 80% ethanol.v. Centrifuge at 12,000 3 g for 3 min to remove residual salt.vi.Decant the DNA pellet.vii.Place tube at 37 C for approximately 30 min (it is important not to leave the DNA on heat longer than necessary).Troubleshooting 2. viii.Add 50 mL of elution buffer.ix.Leave DNA at 21 C (room temperature) for 16 h (overnight) and then at 4 C for a minimum of 2 days prior to quality assessment and sequencing.
CRITICAL: It is important to not pipette the DNA, when possible, pour the DNA over or use wide-bore, low-bind pipette tips.CRITICAL: phenol/chloroform/isoamyl alcohol is toxic and should be used in a fume hood with the appropriate protective equipment.Do not use phenol if it has taken on an orange or red color.

Timing: 2 days
This is a modified protocol for high molecular DNA isolation with QIAGEN Genomic-tips.

Disrupt cells.
a. Prepare the lysis buffer by adding 19 mL RNase A to 9.5 mL of Buffer G2. i. Mix and setting aside.b.Place 60 mg of sample (30-60 flies) in a 1.5 mL tube.
i. Place 1.5 mL tube with the flies in liquid Nitrogen (N 2 ).ii.Place pestle tip in liquid N 2 .Troubleshooting 3.
iii.While the bottom of the tube is in contact (use tongs) with the N 2 slowly manually grind the flies until broken up.iv.Blend the broken-up flies with a handheld electric pestle mixer until a fine powder.
Note: For the best results the tube must remain partially submerged in the N 2 in order to prevent condensation from forming on the sample.Note: This step will indicate how fragmented the DNA is and is important for decided to go forward with the sequencing.
CRITICAL: Caution when handling ethidium bromide, it is a potential carcinogen.
Long-read nanopore sequencing Timing: 2-3 days Followed the protocol provided by Nanopore.
a. Prepare the samples for sequencing using the Nanopore ligation protocol provided with the current version.

Note:
We used SQK LSK-109 for the parthenogenetic D. mercatorum genome and SQK LSK-110 for the sexually reproducing D. mercatorum genome.9. Note: Optional Agilent TapeStation.a. Run library on the TapeStation to ensure it is not fragmented after the library is prepared.Short-read DNA library preparation and sequencing Timing: 2 days A common step in preparing genome assemblies is polishing the assembly with short-read genome data produced using another method.Hi-C is most often used for polishing because it will give a better chromosome-level assembly.2][23][24] Hi-C is still recommended for obtaining better resolution of the non-coding regions of Drosophila genomes.Here we use Illumina sequencing technology to sequence a whole genome DNA preparation in combination with transcriptomics data to polish the genome assembly.The same respective methods to prepare genomic DNA for long-read sequencing were used for the short-read DNA preparation.However, for the short-read library preparation only a single fly was used for the genomic DNA preparation.
11. Library preparation can be done with almost any commercially available kit.
Note: We used the KAPA HyperPrep Kits for NGS DNA Library Prep by Roche for the parthenogenetic genomes and the NEBNext Ultra II DNA Library Prep Kit for Illumina by New England Biolabs for the sexually reproducing genome.
a. Follow the manufactures protocol in preparing the libraries.12. Library quality control (the same as for high molecular weight DNA preparation).
a. Quantify concentration using NanoDrop.b.Quantify concentration using Qubit.c. Check the quality of the library Agilent TapeStation or Agilent Bioanalyzer (Figure 2) Note: We recommend using the TapeStation if possible due to the ease of use and longer lifetime of the reagents.
13. Sequencing can be done on any of the Illumina platforms.
Note: We used both the MiSeq and the NovaSeq.

Note:
For our experiments we also polished the genome using transcriptomics data from our cells of interest.This was beneficial for comparison purposed between our genes of interest.

Genome assembly and assessment
Timing: 5 days -on a high-performance cluster This summarizes steps required to build and evaluate Drosophila mercatorum genome assemblies.
The same process was used for both the sexually reproducing and parthenogenetic Drosophila mercatorum genomes.After the initial assembly these two genomes were compared to each other and to the Drosophila melanogaster reference genome.All the code used to perform the analysis is also publicly available on GitHub (see Data and code availability section).
14. Assemblies.a. Generate the assemblies with wtdbg2, 2 minimap2 3 , and Samtools 4 from the Nanopore data.Troubleshooting 4. b.Polish using Illumina data from the same isolate.i. Perform the alignment.
iv. Apply the polishing with bcftools consensus. 13te: Polished chromosome-level genome assemblies were created, and the assembly quality was assessed using standard metrics of NG50, coverage, and genome size (Table 1), all of which indicated that the genome sequences were of similar or greater quality than other de novo Drosophila genome assemblies.The apparent larger size of the sexually reproducing genome, which shows high representation of repetitive sequence, likely reflects different DNA preparation methods resulting in the overall size of the contigs (NG50) being larger.c.Compute summary statistics on top of the outputs of the alignments.

Note:
The comparison of the D. mercatorum genome assemblies with the D. melanogaster reference genome showed the contigs match to single chromosome arms (Figures 3A and  3B).The content of chromosome arms is largely conserved between Drosophila species. 21- 24There was 75.5% (Figure 3A) and 75.8% (Figure 3B   Note: For the parthenogenetic and sexually reproducing D. mercatorum genome assemblies there was uniform coverage for the larger contigs (Figure 4).However, the smaller contigs did show variation in coverage.This was more pronounced in the sexually reproducing D. mercatorum genome than the parthenogenetic D. mercatorum genome.This indicates that there may be haplotype specific contigs or that there may be genome contamination from one of the commensal organisms present in Drosophila lab cultures.17.Self-heterozygosity analysis.a. Establish estimates for pairwise heterozygosity in each strain.We aligned the Illumina data used for polishing back to the assemblies.D. mercatorum genome, which is adequate for our analyses.We estimated the pairwise heterozygosity with the Illumina data and found that within-strain variation was low.There were 109,035 heterozygous single nucleotide polymorphisms (SNPs) in the sexually reproducing D. mercatorum genome, and since the genome is 171,182,504 bp the pairwise heterozygosity is estimated at 0.0637% for SNPs.By contrast, there were 16,474 heterozygous SNPs in the parthenogenetic D. mercatorum genome, and since the genome is 16,1570,079 bp the pairwise heterozygosity is estimated at 0.0102% for SNPs.
18. Inter-strain divergence.a. Repeat the above analysis, but for each strain align against the other D. mercatorum assembly and then call variants.
b. Normalize.c. Measure the number of non-reference homozygous SNPs (not heterozygous) for the parthenogen.

Note:
The genomes contained 0.82%-0.84%SNPs when compared to each other.
19. K-mer Spectrum Analysis using Meryl and Merqury 9 a.Build Meryl databases for the read sets.
c. Plot the k-mer copy number spectra for both assemblies, using R Note: There were no haplotype specific contigs as shown by the single peak present in the k-mer multiplicity plots (Figure 5).

Genome annotation and assessment
Timing: 5 days -on a high-performance cluster This summarizes steps required to annotate and evaluate Drosophila mercatorum genome assemblies.After the below steps are complete the genome is otherwise ready for functional work and any transcriptomics data can be mapped.However, we also used our transcriptomics data to aid in refining the annotation at the loci of interest.The same process was used for both the sexually reproducing and parthenogenetic Drosophila mercatorum genomes we described above.All the code used to perform the analysis is also publicly available on GitHub (see Data and code availability section).
20. Identify repetitive sequences in the genome using RepeatModeler 10 and masking them using RepeatMasker 11 a.Call repeats de novo using RepeatModeler.
x <-read.delim('Sample_1.merqury.Sample_1.spectra-cn.hist') x$Copies <-as.Note: Hard-masking completely removes repetitive sequences whereas soft-masking keeps them in the file but indicates the repeats are there.We used soft masking for the annotation and hard-masking later in the protocol in Step 25a.e. Ab initio gene prediction with BRAKER2. 13Troubleshooting 5.

Note:
The number of genes predicted by BRAKER2 (Table 2).
22. Link the BRAKER2 de novo annotations with the D. melanogaster annotations.
a. CDS extraction from BRAKER.gtf.b.Plot number of genes per contig using ggplot.
Problem 2 DNA heating (related to Step 2dvii): DNA is less stable when heated.

Potential solution
We heated our sample at body temperature for a limited time in order to avoid leaving it to dry for 16 h (overnight) at 21 C (room temperature), which in our hands resulted in lower molecular weight samples.The heating may cause the DNA to break, therefore it is best to check if this heating is a problem for one's samples and balance this against the conditions of the lab overnight (ambient temperature and exposure to contaminants).

Problem 3
Fly ice cube (related to Step 3bii): the broken-up pieces of flies form an ice cube instead of a powder.

Potential solution
This is caused by condensation forming on the flies as they are being blended.This should be avoided because this can rupture the nuclei and break apart the DNA.In order to avoid this one must work quickly and keep the tube with the flies in contact with the liquid N 2 for the entire duration of the blending process.Bad assembly (related to Step 14a): An assembly with metrics that are much lower or higher than expected.

Potential solution
We used wtdbg2 as our assembler, we did not get good results when we used our more often used assembler, Shasta. 25Therefore, we suggest trying a few since the results may be different depending on the repetitive content of the genomes you are assembling and the type of genome the assembler was designed for.We ended up choosing an assembler that gave us a more contiguous assembly and then we physically mapped the contigs using in situ hybridization and checked the chromosome arm content against D. melanogaster in order to be confident that the assembly was correct. 1 Problem 5 Mixed assembly (related to Step 21e): We found that the sexually reproducing genome assembly contained contigs at lower-than-expected coverage, which we suspected to be non-Drosophila DNA.There were small contigs that were densely packed with genes, which is not typical of metazoan genomes.The D. mercatorum genomic DNA was prepared from whole animals and therefore potentially included the genome commensal organisms.

Potential solution
We used basic local alignment search tool (BLAST) to determine what organism(s) the genes on all the small contigs came from.There are other tools available (such as BlobTools), but this simple method was more than adequate for our purpose.We found a common commensal gut bacterium to be our only contaminant.There were more contigs from the commensal organism in the assembly that was prepared using the Genomic-tip method than with the phenol method.Therefore, to avoid this then the phenol extraction method is a better option.

Materials availability
This study did not generate new unique reagents.STAR Protocols 5, 102974, June 21, 2024 Protocol genomics-related analyses.A.L.S., D.K.F., E.G., and D.M.G. contributed to choices in methodology and experimental design.Funding for the genomics analysis was obtained by E.G.The funding for all biological experiments was obtained by A.L.S. and D.M.G.
Note on storage conditions: 21 C (room temperature), maximum storage time 6 months.

Figure 1 .
Figure 1.Precipitated genomic DNA The white arrow indicates the precipitated DNA

10 .
Prepare flow cell and start sequencing.a. Prime flow cell for sequencing with the protocol provided for the flow cell.b.Load sample on the flow cell and initiated the sequencing run.

Figure 2 .
Figure 2. Short-read D. mercatorum genome library quality control (A) Bioanalyzer results for the parthenogenetic genome library.There is a single sample peak along with the lower and upper markers.The average size of the library is 635 bp.(B) TapeStation results for the sexually reproducing genome library.There is a single sample peak along with the lower and upper markers.The average size of the library is 513 bp.

Protocolc.
Get a summary of the coverage statistics, weighted on a per-base pair (not per-contig) level.

Figure 3 .
Figure 3. Genome assembly comparisons (A) Parthenogenetic D. mercatorum genome compared to the D. melanogaster reference genome (release 6).(B) Sexually reproducing D. mercatorum genome compared to the D. melanogaster reference genome (release 6).(C) Parthenogenetic D. mercatorum genome compared to the sexually reproducing D. mercatorum genome.Purple dots/lines represent sequences matching against the forward strand and blue the reverse.The red arrows indicate inversions.These images were originally published by Sperling et al. 1

Figure 4 .
Figure 4. Genome coverage (A) coverage for the sexually reproducing D. mercatorum genome.(B) coverage for the sexually reproducing D. mercatorum genome.These images were originally published by Sperling et al. 1

21 .Figure 5 .
Figure 5. Haplotype analysis (A and B) Parthenogenetic and Sexually reproducing D. mercatorum genomes 19 bp-kmer copy number spectra showing a single homozygous peak (13 in assembly, and average read depth in the Illumina data).These images were originally published by Sperling et al. 1

Figure 8 .
Figure 8. Assessment of genome assembly completeness by comparison against the Diptera specific BUSCO dataset for D. melanogaster as the control and the parthenogenetic and sexually reproducing D. mercatorum genome assemblies This image was originally published by Sperling et al. 1 4. Extract and concentrate DNA.a.Add 240 mL of lysis buffer to the 1.5 mL tube containing the fly powder and then transfer the lysis-fly solution into a 15 mL tube (on ice).i.Repeat 3 times to ensure that all the material is transferred.ii.Add the remaining lysis buffer to the tube.b.Immediately vortex for 5 s.c.Incubate lysate at 37 C for 1 h, gently inverting tube after 30 min.
d. Digest protein i. Add 500 mL QIAGEN Proteinase K and gently invert the tube 5 times.ii.Incubate at 50 C for 2 h and gently inverting the tube 5 times after 1 h.iii.10 min before the end of the lysis incubation time, prepare a Genomic-tip by equilibrating it with 4 mL Buffer QBT.e.Isolate DNA.i.Vortex the lysed sample for 5-10 s at maximum speed.ii.Gently pour the sample into five 2 mL tubes.iii.Centrifuge at 5,000 3 g for 10 min at 4 C. iv.Immediately after the centrifugation pour the samples on the equilibrated Genomic-tip.v.Wash twice with 7.5 mL Buffer QC. vi.Pre-warm 5 mL Buffer QF to 50 C. vii.Eluted DNA with the 5 mL warm Buffer QF into a 50 mL tube.f.Concentrate DNA.i.Add 3.5 mL isopropanol to the sample.ii.Gently invert 10-20 times.iii.Pour sample into 2.0 mL tubes and centrifuged immediately at 10,000 3 g for 30